Add --skip-duplicates flag to wp media import#241
Conversation
Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
…s for non-skip-duplicates paths Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ile handling Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
…age_file Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…ithub.io Agent-Logs-Url: https://github.com/wp-cli/media-command/sessions/f70a3d41-3087-4796-9a52-5033dcce663d Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
…instead of raw SQL Agent-Logs-Url: https://github.com/wp-cli/media-command/sessions/0f126b51-1ee9-48c9-9031-92a31261b462 Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a --skip-duplicates flag to the wp media import command, allowing users to skip files already present in the media library. It includes new Behat tests and logic to detect duplicates based on the _wp_attached_file meta key. Feedback identifies a potential performance improvement by caching basename calculations for remote files. More importantly, the use of LIKE in the duplicate detection query may cause false positives, and the removal of a compatibility check for wp_get_additional_image_sizes() breaks support for WordPress versions older than 4.7.0.
| } | ||
| } else { | ||
| if ( Utils\get_flag_value( $assoc_args, 'skip-duplicates' ) ) { | ||
| $existing = $this->find_duplicate_attachment( (string) explode( '?', Path::basename( $file ), 2 )[0] ); |
There was a problem hiding this comment.
The expression (string) explode( '?', Path::basename( $file ), 2 )[0] is calculated twice for each remote file (here and again on line 653). For better performance in bulk import operations and improved maintainability, consider extracting this to a variable (e.g., $basename) before the duplicate check and reusing it.
| 'key' => '_wp_attached_file', | ||
| 'value' => '/' . $variant, | ||
| 'compare' => 'LIKE', | ||
| ); | ||
| } |
There was a problem hiding this comment.
Using LIKE with a leading slash for duplicate detection can lead to false positives. In WordPress meta_query, a LIKE comparison wraps the value in wildcards, resulting in a query like meta_value LIKE '%/image.jpg%'. This will incorrectly match files like image.jpg.webp or image.jpg.bak when searching for image.jpg, causing the import to be skipped for non-duplicate files. To be more precise, you might consider using raw SQL with $wpdb to match the end of the string exactly (e.g., meta_value = %s OR meta_value LIKE %s with '%/' . $variant) or fetching the results and verifying the basename in PHP.
This comment was marked as resolved.
This comment was marked as resolved.
…quire-wp-5.3 Behat scenario Agent-Logs-Url: https://github.com/wp-cli/media-command/sessions/43c9f132-60d4-48c3-93e9-1e252883fa46 Co-authored-by: swissspidy <841956+swissspidy@users.noreply.github.com>
--skip-duplicatesflag toimport()docblock and@paramtype hint$skips = 0variable inimport()_wp_attached_filebasename match_wp_attached_filebasename match (extracted from URL)WP_Querywithmeta_query(OR relation) for duplicate detection instead of raw SQLexplode()instead ofstrtok()consistently for URL query-string stripping$skipstoreport_batch_operation_resultsonly when--skip-duplicatesis activefind_duplicate_attachment()private helper methodbasename-scaled.extfrom the input basename and include it in the_wp_attached_filesearch--skip-duplicatesduplicate check to use the--file_name-resolved name when--file_nameis provided, so files previously imported with a custom name are correctly detected as duplicates--skip-duplicates(local file, remote file, mixed batch,--file_name-aware duplicate detection)@require-wp-5.3Behat scenario to explicitly exercise scaled-image duplicate detectionwp-cli.orgtowp-cli.github.io(matching commit Update behat-data URLs in tests #250)Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.